Mitigating the effects of measurement noise on Granger causality 



Hariharan Nalatore^'B Govindan Rangarajan, 2 '0 and Mingzhou Ding 1 '!! 

The J. Crayton Pruitt Family Department of Biomedical Engineering, 
University of Florida, Gainesville, FL 32611, USA 
2 Department of Mathematics, Indian Institute of Science, Bangalore - 560 012, India 

Computing Granger causal relations among bivariate experimentally observed time series has 
received increasing attention over the past few years. Such causal relations, if correctly estimated, 
can yield significant insights into the dynamical organization of the system being investigated. 
£ — ■ Since experimental measurements are inevitably contaminated by noise, it is thus important to 

understand the effects of such noise on Granger causality estimation. The first goal of this paper 
is to provide an analytical and numerical analysis of this problem. Specifically, we show that, due 
£N) ■ to noise contamination, (f) spurious causality between two measured variables can arise and (2) 

true causality can be suppressed. The second goal of the paper is to provide a denoising strategy to 
mitigate this problem. Specifically, we propose a denoising algorithm based on the combined use of 
the Kalman filter theory and the Expectation-Maximization (EM) algorithm. Numerical examples 
' are used to demonstrate the effectiveness of the denoising approach. 
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■ I. INTRODUCTION 

Granger causality [1| has become the method of choice to determine whether and how two time series exert causal 
influences on each other. In this method one starts by modeling simultaneously acquired time series as coming from 
a multivariate or vector autoregressive (VAR) stochastic process. One time series is said to have a causal influence on 
the other if the residual error in the autoregressive model of the second time series (at a given point of time) is reduced 
^•-y by incorporating past measurements from the first. This method and related methods have found applications in a 
Ji^ wide variety of fields including physics 0, H, 0, [E IE 0] j economics (T), [l(J EL E2] and neuroscience d, |j| . Its nonlinear 
Q-i' extension has recently appeared in [l3| and has been applied to study problems in condensed matter physics [l4j. 

The statistical basis of Granger causality estimation is linear regression. It is known that regression analysis is sensi- 
— i . tive to the impact of measurement noise [151 ] . Given the inevitable occurrence of such noise in experimental time series, 
^ ' it is imperative that we determine whether and how such added noise can adversely affect Granger causality estima- 
tion. Previous studies [r| have suggested that such adverse effects can indeed occur. In this paper, we make further 
progress by obtaining analytical expressions that explicitly demonstrate how the interplay between measurement noise 
and system parameters affects Granger causality estimation. Moreover, we show how this deleterious effect of noise 
can be reduced by a denoising method, which is based on the Kalman filter theory and the Expectation-Maximization 
(EM) algorithm. We refer to our denoising algorithm as the KEM (Kalman EM) denoising algorithm. 

The organization of this paper is as follows. In Section 2, we start by introducing an alternative formulation of 
Granger causality [17] and proceed to outline a framework within which the effects of added (measurement) noise on 
the estimation of directional influences in bivariate autoregressive processes can be addressed. To simplify matters, 
we then consider a bivariate first order autoregressive (AR(1)) process in Section 3. Here explicit expressions for the 
effect of noise on Granger causality are derived. These expressions allow us to show that, for two time series that are 
unidircctionally coupled, spurious causality can arise when noise is added to the driving time series and true causality 
can be suppressed by the presence of noise in either time series. The theoretical results are illustrated by numerical 
simulations. In Section 4, we briefly introduce the KEM denoising algorithm and apply it to the example considered 
in Section 3. Our results show that the KEM algorithm can mitigate the effects of noise and restore the true causal 
relations between the two time series. In section 5, we consider a coupled neuron model which produces time series 
that closely resemble that recorded in neural systems. The effect of noise on Granger causality and the effectiveness 
of the KEM algorithm in mitigating the noise effect are illustrated numerically. Our conclusions are given in Section 
6. 
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II. THEORETICAL FRAMEWORK 

Consider two time series X(t) and Y(t). To compute Granger causality, we model them as a combined bivariate 
autoregressive process of order p. In what follows, the model order p is assumed to be known, since this aspect is not 
central to our analysis. The bivariate autoregressive model can then be represented as: 



j2[a k X(t -k) + b k Y(t - k)] = Et{t), (1) 

P 

Y^[c k X{t-k) + d k Y(t-k)]=E 2 {t), (2) 

fc=0 

where a k , b k , c k , and d k are the AR coefficients and Ei(t) are the temporally uncorrelated residual errors. 

For our purposes, it is more convenient to rewrite the above bivariate process as two univariate processes (this can 
always be done according to [13] ) : 

P 1 (B)X(t)=Z(t); P2(B)Y(t) = v (t), (3) 

where B is the lag operator defined as B k X(t) = X(t — k) and Pi and P2 are polynomials (of possibly infinite order) 
in the lag operator B. It should be noted that the new noise terms £(i) and r](t) are no longer uncorrelated. Let 
712 (k) denote the covariance at lag k between these two noises. 

7l2 (fc) = cov(£(i)),7 7 (i-fc)) k = ... ,-1,0,1.... (4) 

A theorem by Pierce and Haugh [13] states that Y(t) causes X(t) in Granger sense if and only if 

712 (&) ¥= for some k > 0. (5) 

Similarly X{t) causes Y(t) if and only if 712(A) ^ for some k < 0. 

Now we add measurement noises £'(£) and 77' (t) to X(t) and Y{t) respectively: 

x^(t) = x(t)+e(t), (6) 

Y^(t)=Y(t)+ V '(t). (7) 

Here £'(i), r) 1 (t) are uncorrelated white noises that are uncorrelated with X(t), Y(t),£,(t) and r](t). Following Newbold 
[lH, the above equations can be rewritten as 

Pi(5)xW(t) = Px(B)X(t) + Pi(B)e(t), (8) 

P 2 (B)Y^(t) - P 2 (B)Y(t) + P 2 (BW(t). (9) 

Using Eq. (J3j) we get 

P 1 (B)X^(t)=at)+Pi(B)Z'(t), (10) 

P 2 (B)Y^(t)= V (t)+P 2 (B)rj'(t). (11) 

Following the procedure in Granger and Morris the linear combination of white noise processes on the right 
hand sides can be rewritten in terms of invcrtible moving average processes [19J : 

at) + Pim'(t) = p 3 (B)^ c \t), (12) 

V (t) + P 2 {B)i(t) = P 4 (B) V W(t), (13) 
where £^ c ' and rf^ are again uncorrelated white noise processes. Thus we get 

P^ l {B)P 1 {B)X^{t) = ^\t), 

p- 1 (B)P 2 (B)Y^(t)^ V ^(t). (14) 

This is again in the form of two univariate AR processes. Therefore the theorem of Pierce and Haugh can be applied 
to yield the result that the noisy signal Y^(t) causes X^ c \t) in Granger sense if and only if 

7 ( c 2 ) (fc)^cov(e( c )(t),ry( c )(^fc))^0, (15) 
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for some k > 0. Similarly X<- C \t) cause Y<- C >(t) if and only if 

7l (c 2 W0, (16) 

for some k < 0. 

We can relate 7^ to 712 as follows. Consider the corresponding covariance generating functions (which are nothing 
but the ^-transforms of the cross-covariances) 



712(2) = 2^ 712(*)**, 

fc= — OO 

oc 

7#(*) = E 7#(*)* fc - (17) 

k— — oc 

We can show that [l6| 

7g( Z ) = P 3 - 1 (z)P 4 - 1 (^- 1 )7i 2 W- (18) 

Even if 712 (fc) = for all k < (i.e. X does not cause Y) it is possible that J^(k) 7^ for some negative k because 
of the additional term P 3 ~ 1 (z)P 4 ~ 1 (z _1 ) that has been introduced by the measurement noise. This gives rise to the 
spurious Granger causality, (X^ c > causes Y~( c )), which is a consequence of the added measurement noise. 



III. A BIVARIATE AR(1) PROCESS 

In the previous section, we demonstrated that measurement noise can affect Granger causality. But the treatment 
given was quite general in nature. In this section we specialize to a simple bivariate AR(1) process and obtain explicit 
expressions for the effect of noise on Granger causality. 

Consider the following bivariate AR(1) process 

X(t) = aX(t-l)+bY(t-l)+Ih(i), 

Y[t) = dY{t-l) + E 2 (t). (19) 

From the above expressions, it is clear that Y drives X for nonzero values of b and X does not drive Y in this model. 
More specifically, we see that Y at an earlier time t—1 affects X at the current time t. There is no such corresponding 
influence of XonY. 

When noises and rj'if) with variances er|, and ai,, respectively, are added to the data generated by Eq. (| 19[) . 
after some algebra (see Appendix for details), we find the following expressions for Pi{B) and Pa{B): 

P 3 (B) = l + a' 1 B + a' 2 B 2 - 1 P 4 (P) = 1 - d ' B. (20) 

Here 



where 



d'= S± ^\ (21) 



1 1 di 

,= - +< Q + -_f. (22) 
a d a*, 



The expressions for a[ and a' 2 are very long and for our purposes it is sufficient to note that they go to zero as the 
added noise goes to zero (as expected). We see that \s\ > 2 for any value of d, a 2 and a 2 ,. Therefore \/ S 2 ~ 4 and 
hence d 1 are well defined. We also have the following results: 

a) As \d\ -» 0, \d '| < \d\ -> 0; 



b)Asd-l,d + A- /1 + 4/ 
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c) As the ratio > 0, d — ► d; 

v' 

a 2 i 

d) As the ratio > oo, d — > 0. 

Substituting the expressions for Pz{B) and Pi{B) in Eq. (TTB")) we get 

7$(z) = (1 + + ^^(l - d ' z- 1 )- 1 ^). (23) 
We now expand both sides in powers of z: 

■ + -ytfi-Vz- 1 + 7#(0) + 7^(1)* + • • • = (1 - <h* + (a? ~ 4)z 2 + • • •) 
x (1 + d ' z- 1 + d' 2 z- 2 + ...)(...+ 7l2 (-l)z- 1 + 7l2 (0) + 7 i 2 (l)z + •••)■ (24) 

Collecting terms proportional to z ,z ,z etc., we obtain the following expressions for the cross covariances at lag 
-1, and 1: 

7#(-l) = d '(1 - a/d ' + . . .)(7i2(0) + d ' 7l2 (l) + ...), (25) 
7^(0) = (l- a ;d' + ...)(7i2(0)+d '712(1) + ...), (26) 
7^(1) = 712(1) - a[ 7i 2 (0) - a[d ' 7l2 (l) + . . . . (27) 

We observe that j[^(k) for k < (and in particular, 7^2 (—1)) is n0 longer zero, implying that the drives Y^ c \ 
thus giving rise to a spurious causal direction. The spurious causality term 712 (— 1) is proportional to d . This can 
be shown to be true for all the other spurious terms j[^(k), k < — 1 as well. Hence they all go to zero if d — > 
(i.e. if Y has no measurement noise). This happens even if a l and a 2 are non-zero (i.e. even if X measurement is 
contaminated by noise). Hence we arrive at an important conclusion that if Y is driving X, only measurement noise 
in Y can cause spurious causality. If Y has no measurement noise, no amount of measurement noise in X can lead to 
spurious causality. Further, using the asymptotic properties of d listed earlier, we can easily see that the magnitude 
of the spurious causality increases as d ~ > 1 and as the ratio cr 2 /a 2 , — > 0. 

The foregoing demonstrates that noise can lead to spurious causal influences that are not part of the underlying 
processes. Here we show that the true causality terms (712 (k) for k > 0) are also modified by the presence of noise. 
They undergo a change even if d =0. For example, 712(1) is changed to 712(1) — 02712(0) even if d =0. Therefore, 
it is quite possible that even a true causal direction can be masked by added noise and the measurement noises in 
both time series contribute to this suppression. As the ratios cr^/a 2 , and <t 2 /<j 2 , — ► 00, a 1 ,a 2 ,d all go to zero and 

7^ — > 712, as expected. 

We make one final observation. If we replace z by e l27r ^ (where / is the frequency) in the covariance generating 
function [cf. Eq. 1|17|) ] we obtain the cross spectrum. Hence all the above results carry over to the spectral/frequency 
domain. 

To illustrate the above theoretical results, we estimate Granger causality spectrum (in the frequency domain) for a 
bivariate AR process numerically. First, we briefly summarize the theory behind this computation 0. The bivariate 
AR process given in Eq. (fT]) can be written as: 

p 

Y J A{k)Z{t-k)=E{t), (28) 

fc=0 

where Z{t) = [X(t),Y(t)] T ; E{t) = [E^t), E 2 (t)] T and 

a W = (:«>:£), (29 , 

for 1 < k < p. A(0) is the 2x2 identity matrix. Here, E(t) is a temporally uncorrelated residual error with covariance 
matrix E. We obtain estimates of the coefficient matrices A(k) by solving the multivariate Yule- Walker equations [2(| 
using the Levinson- Wiggins- Robinson (LWR) algorithm [2l|. From A(k) and E we estimate the spectral matrix S(f) 
by the relation 



S(f) = #(/)£#*(/), 



(30) 
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where H(f) — E&=o A(k)e 2mk *] 1 is the transfer function of the system. 
The Granger causality spectrum from Y to X is given by HH2] (see also H|) 

Jr-^x (/) = - ln[l ] • (31) 

Here, En, E22 and £12 are the elements of £ and Sii(/) is the power spectrum of X at frequency /. Hij(f) is the 
{£7} element of the transfer function matrix H(f). Similarly, the Granger causality spectrum from X to Y is defined 
by 

(£ U -?^)|tf2l(/)| 2 

(/) = - ln[l ] , (32) 

and S22H) is the power spectrum of F at frequency /. 

We now estimate the Granger causality spectrum for the specific AR(1) process given in Eq. (JTHJ) where Y drives 
X and X does not drive Y, The parameter values used are a — 0.4, b = 0.6, d = 0.9, <7£ = 0.2 and er r) = 1.0. We 
obtain two time series X and Y by numerically simulating the VAR model and then adding Gaussian measurement 
noise with ay = 0.2 and a v > — 2.5. For concreteness we assume that each time unit corresponds to 5 ms. In other 
words, the sampling rate is 200 Hz, and thus the Nyquist frequency is 100 Hz. The dataset consists of one hundred 
realizations, each of length 250 ms (50 points). These 100 realizations are used to obtain expected values of the 
covariance matrices in the LWR and KEM algorithms (see next section). The Granger causality spectra Ix^y{f) 
and Iy— *x{f) are plotted in Figure 1. The solid lines represents the true causality spectra while the dashed lines 
represent the noisy causality spectra. 

Similarly, we also simulated the following bivariate AR(2) process: 

X(t) = oJC(i-l) + 6y(i-l) + -Ei(t), 

Y{t) = d 1 Y(t-l) + d 2 Y(t-2)+E 2 (t). (33) 

The values of the parameters a and b used were the same as in the previous AR(1) process example (Eq. I19[) except 
for the values of the new parameters d\ and c?2 which were chosen to be 0.4 and 0.5 respectively. We again obtain 
two time series X and Y and then added Gaussian measurement noise with a^< — 0.2 and ay = 2.5 to X and Y 
respectively. The Granger causality spectra Ix^yH) and Iy^x{f) are plotted in Figure 2. As before, the solid lines 
and dashed lines represent the true causality spectra and noisy causality spectra, respectively. 

We observe that the measurement noise has a dramatic effect in both of these cases: It completely reverses the true 
causal directions. For the noisy data, X appears to drive Y and Y does not appear to drive X. 

The above theoretical and numerical results bring out clearly the adverse effect that noise can have on correctly 
determining directional influences. The same is also true for other quantities like power spectrum and coherence. 
Therefore it is imperative that the effect of noise be mitigated to the extent possible. 



IV. THE KEM DENOISING ALGORITHM 



In the previous section we have seen that noisy data can lead to grave misinterpretation of directional influences. 
We now provide a practical solution to this problem by combining the Kalman smoother with the Expectation- 
Maximization al gori thm [24J. The detailed algorithm is long and tedious. We outline the main logical steps below. 

Kalman filter [25j is a standard algorithm for denoising noisy data. To apply this, we first need to recast a VAR 
process with measurement noise in the so-called state-space form. This is nothing but the difference equation analogue 
of converting a higher order differential equation to a system of first order differential equations. Once this is done, 
our VAR model takes on the following form: 

x t+ i = Ax t +w t+i , (34) 
yt = Cx t +v t . (35) 

Here x t is an M x 1 ("true") state vector at time t. A is an M x M state matrix. w t is a zero mean Gaussian 
independent and identically distributed random variable with covariance matrix Q. Bivariate AR(p) models can 
be put in the form x t+ i = Ax t + w t+ i by defining M — 2p auxiliary variables Xi.t- The N x 1 vector y t is the 
observed/measured value of x t in N channels. C is an N x M observation matrix and is a fixed, known matrix for 
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VAR models. Hence we will ignore this in future discussions. The N x 1 vector v t is the measurement noise which is 
zero mean, Gaussian, independent and identically distributed with covariance matrix R. 

Kalman filter, however, can not be directly applied to denoise experimental or observed data since it assumes the 
knowledge of the model describing the state space dynamics. In practice, such knowledge is often not available. To 
get around this problem, we apply the Kalman smoother in conjunction with the Expectation and Maximization 
algorithm [24|, HE H3, 0j- Thus, this denoising algorithm will henceforth be called the KEM algorithm. In this 
algorithm, one follows the standard procedure for estimating state space parameters from data using the maximum 
likelihood method. The appropriate likelihood function in our case is the joint log likelihood logP({x}, {y}) where 
{x} denotes {x t } (for all t) and similarly for {y}. In the usual maximum likelihood method, P would not depend on 
x and we would therefore maximize the above quantity directly (conditioned on the observed y t values) and obtain 
the unknown state space parameters. But in our case, P depends on x which is also unknown. To get rid of x, we 
take the expected value of the log likelihood 

= P[logP({x},{y}) | {y}]. 

As usual, we have conditioned the expectation on the known observations {y}. 

To compute O, it turns out we need the expectations of x and xx T (where T denotes the transpose) conditioned 
on y. These expectations are obtained by applying the Kalman smoother on the noisy data. We use the Kalman 
smoother and not the Kalman filter since we are utilizing all the observations y instead of only the past observations. 
This is the appropriate thing to do in our case since we are performing an off-line analysis where all observations are 
known. In other words, in Kalman smoother, we perform both a forward pass and a backward pass on the data in 
order to make use of all observations. 

To apply the Kalman smoother, however, we still need the state space model parameters (just as in the Kalman 
filter case). To circumvent this problem, we start with initial estimates for these parameters (A, Q and R) as follows. 
From the noisy data, using the LWR algorithm, we obtain the VAR model coefficient matrices |8j. Then a standard 
transformation [25| is used to put these matrices in the state space form giving the initial estimate for A. The initial 
estimate of Q is taken to be the identity matrix following the standard procedure [2^|. The initial estimate of R is 
taken to be half the covariance matrix at lag zero of the noisy data. The approximate model order can be determined 
by applying the AIC criterion [29( in the LWR algorithm. This step is admittedly rather ad hoc. Further studies to 
optimize the above initial estimates and the VAR model order p are currently being carried out. Once we have initial 
estimates of the model parameters, we can apply the Kalman smoother to obtain the various conditional expectations 
and evaluate the expected log likelihood O. This is called the expectation (E) step. 

Next, we go to the maximization (M) step. Each of the parameters A, Q, R etc is re-estimated by maximizing 
O. Using these improved estimates, we can apply the E step again followed by the M step. This iterative process 
is continued till the value of log likelihood function converges to a maximum. We could now directly use the VAR 
parameters estimated from the KEM algorithm for further analysis as is usually done. But here we prefer to use the 
following procedure which was found to yield better performance. The final denoised data (that is, the estimate of x 
obtained from the KEM algorithm) is treated as the new experimental time series and subjected to parametric spectral 
analysis from which Granger causality measures can be derived. The Matlab code implementing this algorithm for 
our applications is available from the authors upon request. 

We have compared the denoising capabilities of the KEM algorithm with two widely used algorithms, the higher- 
order Yule- Walker (HOY) method [3(3] and the overdetermined higher-order Yule- Walker method (3l|. We find that 
the denoising capabilities of the KEM algorithm is superior. Detailed results will be presented elsewhere. In Figure 
3, we explicitly show that KEM algorithm performs better than the HOY method (see below). 

The KEM algorithm is applied to denoise the data shown in Figures 1 and 2. Figure 3 displays the same exact 
Granger causality spectra (solid lines) as that in Figure 1 and the Granger causality spectra (dashed lines) obtained 
from the denoised data using KEM algorithm. Causality spectra obtained using HOY method is also shown (as dotted 
lines). It is clear that the KEM method performs better. In Figure 4, the solid lines again represent the same exact 
Granger causality as that in Figure 2 and the dashed lines represent the Granger causality spectra obtained from 
the denoised data of a bivariate AR(2) process. We see that the correct causal directions are recovered and that the 
denoised spectra are reasonably close to the true causality spectra for both AR(1) and AR(2) process. We stress that 
these results are achieved without assuming any knowledge of the VAR models [Eqs. [19] and [33] that generated the 
original time series data. 

V. CAUSAL RELATIONS IN A NEURAL NETWORK MODEL 

In this section, we analyze the effect of noise on time series generated by a neural network model. We first 
demonstrate the effect of measurement noise on causality directions and then the effect of applying the KEM algorithm 
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on the noisy data. 

Our simulation model comprises two coupled cortical columns where each column is made up of an excitatory and 
an inhibitory neuronal population [32| . The equations governing the dynamics of the two columns are given by 

x'i + (a + b)xi + abxi = -k ei Q(yi(t),Q m o) + hjQ(%j(t),Qmo) + €xi(t), (36) 
y'i + (a + b)y\ + aby t = k ie Q(xi(t), Q m0 ) + £ Vi (t), (37) 

where i ^ j = 1,2. Here x and y represent local field potentials (LFP) of the excitatory and inhibitory populations 
respectively, ki e > gives the coupling gain from the excitatory (x) to the inhibitory (y) population, and fe e i > is the 
strength of the reciprocal coupling. The neuronal populations are coupled through a sigmoidal function Q{x, Q m o) 
which represents the pulse densities converted from x with Q m Q a modulatory parameter. The function Q(x, Q m o) is 
defined by 

e(x,c m o)={«r |I_ """ ,),OM1 ^<::: • < 38 > 

where uo = — ln[l + ln(l + q^)]- The coupling strength fey is the gain from the excitatory population of column j to 
the excitatory population of column i, with fey = for i — j. The terms £(£) represent independent Gaussian white 
noise inputs given to each neuronal population. 

The parameter values used were: a = 0.22/ms, b — 0.72/ms, k; Le = 0.1, k e i = 0.4, ki2 = 0, fei = 0.25 and Q m o = 5. 
The standard deviation for the Gaussian white noise was chosen as 0.2. Assuming a sampling rate of 200Hz, two 
hundred realizations of the signals were generated, each of length 30 s (6,000 points). 

We now restrict our attention to the variables x\(t) and xa(£). Measurement noises (Gaussian white noises with 
standard deviations 2.0 and 3.0 respectively) were added to these variables. From the model it is clear that x\(t) 
should drive X2{t) since ku = while &2i = 0.25. The results of applying Granger causality analysis (using a VAR 
model of order 7) on these two variables is shown in Figure 5. The solid lines represent the causality spectra for the 
noise-free data. The dashed lines represents the causality spectra for the noisy data. It is clear that the measurement 
noise has an effect on the causal relations by significantly reducing the true causality magnitude. In contrast to the 
example in Section 3, however, no spurious causal direction is generated here, despite the fact that both time series 
are contaminated by measurement noise. Next, we applied the KEM algorithm to denoise the noisy data. When 
Granger causality analysis is performed on the denoised data, we obtain causality spectra that are closer to the true 
causality spectra (see Figure 6). We note that the KEM algorithm is not able to completely remove the noise as the 
denoised spectra are still quite different from the true spectra. 

To show that the denoised Granger spectrum is significantly different from that of the noisy data we use the 
bootstrap approach [33| to establish the significant difference between the two peaks observed in Granger causality 
spectrum of Figures 5 and 6 (shown by dashed lines in these Figures) . One thousand rcsamples of noisy data and the 
denoised data were generated by randomly selecting trials with replacement. It should be noted that in any selected 
trial, the entire multichannel data is taken as it is thus preserving the auto and cross correlation structures. Thus, 
we employ a version of block bootstrap method [33[ . The peak values of Granger causality were computed for each 
resample using both noisy data and denoised data. Let us denote these peak values by the random variables Z\ and 
Zi respectively. The two population Student t-test was performed to determine whether the means of Z\ and Zi are 
different at a statistically significant level. 

The null hypothesis was that the means of the two populations Z\ and Zi are equal. The t value was found to 
be very large: 4.6446 * 10 3 and corresponds to a two-tailed p value less than 0.0001. Thus the null hypothesis that 
the two groups do not differ in mean is rejected. This establishes the fact that the peak of the Granger causality 
spectrum of the denoised data is significantly higher than that of the noisy data. Figure 7 shows the plot of Granger 
causality for the direction x\ — > xi along with 95% confidence intervals. The 95% confidence intervals are calculated 
as I Xl -> X2 (f) ± 1.96ctb (for each frequency /) where o\b is the sample standard deviation of the 1000 bootstrap 
replications of I Xl ^ X2 (f). 

VI. CONCLUSIONS 

Our contributions in this paper are two fold. First, we demonstrate that measurement noise can significantly 
impact Granger causality analysis. Based on analytical expressions linking noise strengths and the VAR model 
parameters, it was shown that spurious causality can arise and that true causality can be suppressed due to noise 
contamination. Numerical simulations were performed to illustrate the theoretical results. Second, a practical solution 
to the measurement noise problem, called the KEM algorithm, was outlined, which combines the Kalman filter theory 
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with the Expectation and Maximization (EM) algorithm. It was shown that the application of this algorithm to 
denoise the noisy data can significantly mitigate the deleterious effects of measurement noise on Granger causality 
estimation. It is worth noting that, despite the fact that the adverse effect of measurement noise on Granger causality 
has been known since 1978 [lg|, mitigation of such effect has received little attention. The KEM algorithm described 
in this paper is our attempt at addressing this shortcoming. 
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APPENDIX 



In this appendix, we derive the expressions for P${B) and Pa{B) given in Eq. ([20]) . We first determine Pi(B). 
When a zero mean white noise process r]'(t) with variance cr 2 / is added to Y(t) we get 

Y^ c \t) =Y(t)+r)'{t). (39) 
Applying (1 — dB) on both sides of the above equation we get 

(1 -dB)Y (c \t) = (l-dB)Y(t) + (l-dB)r)'(t) 

= r)(t) + (l-dB)r)'(t). (40) 

We now determine a white noise process rf^ (t) such that 

r)(t) + (1 - dB)rf{t) = (1 - d'B)rj^(t). (41) 

We need to determine d' and er 2 ,,., . 

Taking variances on both sides of the above equation we get 

a 2 + (l + d 2 K 2 , = (l + d> 2 (c) . (42) 

Taking autocovariance at lag 1 on both sides we obtain 

d<r 2 ,=dV 2 (o) . (43) 

Since rf^ is a sum of rj and (1 — dB)r]' , we have <7 2 (c) > cr 2 , . This implies that \d'\ < \d\. Since stationarity of the AR 
process requires < \d\ < 1, we obtain the inequality < \d'\ < \d'\ < 1. Further d' has the same sign as d. 
We have 

Substituting in the variance equation we get 

' d 



^)=44- (44) 



(l + d' 2 4a 2 , = (l + d 2 )a 2 ,+a 2 , (45) 



that is, 



Let 



. 1 .1 



s =(- + d) + -^-. 
a a at, 
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This gives 



Hence 



(i +<f) = s. (47) 



^£±^4 (48) 

Note that |s| > 2 for any value of d, a 2 , and a 2 ,. Therefore V s 2 — 4 and hence <i' are well defined. Further, since 

< \d\ if is positive, d' = (s — \/s 2 — 4)/2 is the only valid solution. If c? is negative, d' = (s + V« 2 — 4)/2 is the 
only valid solution. 

Next, we derive the expression for P^(B). First, we first need to rewrite X(t) as an univariate process i.e. we need 
to determine Pi(B): 

P 1 (B)X(t)=£(t), (49) 
where £(f) is a zero mean white noise process and 

X(t) = aX(t-l)+bY(t-l)+E 1 (t). (50) 
Here Ei(t) is a zero mean white noise process with variance a 2 . We have already seen that 

(l-dB)Y(t)= V (t). (51) 

The equation for X{t) can be written as 

(l-aB)X(t) = bY(t-l)+E 1 (t). (52) 
Substituting the expression for Y(t — 1) we obtain 

(1 - aB)X{t) = 6(1 - dB)- 1 ^ - 1) + £d(t). (53) 
We now find a white noise process with variance cr| such that 

6(1 - dB)- 1 ^ - 1) + £?!(t) = (1 - rB)- 1 ^*). (54) 
To determine r and er|, we take variance and autocovariance at lag 1 on both sides. Taking variance we obtain 



(l-d 2 ) e (1-r 2 )' 

Taking autocovariance at lag 1 and assuming that a ev (the cross-covariance between E\ and £) is zero for simplicity, 
we get 



b 2 a 2 d ai 



(l-d 2 ) (1-r 2 )' 

which can be written as 



b 2 a 2 d 



(1-r 2 ) (l-d 2 )r' 

Substituting in the variance equation we obtain 



<J 2 2 b 2 <7 2 d 



(l-d 2 ) £ (l-d 2 ) 
Thus 

b 2 da 2 



b 2 a 2 + (1 - d 2 )a 2 



(56) 



(57) 



(58) 



(59) 
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If b = 0, we get r — and cr| = af as expected. Similarly if d — 0, we get r = and a 2 — <r 2 + b 2 a 2 as expected. 
Once r is known, erf is given by 



4 = (l-r 2 ) 



We finally have 



That is, 



(1 - aB)X{t) = (1 - rS)-^(t). 
Pi(B)X(t) = £(t), = (1 - rB)(l - oB). 



(60) 



(61) 



(62) 



Consider a white noise process £'(t) (which is uncorrelated with X(i)) and has variance cr?,. This is added to X(t) 



to obtain the noisy process X^ c \t): 

X( c \t) = X{t)+?(t). 
Applying Pi{B) on both sides of the above equation, 

P 1 (B)xV(t)=m+Pi(B)H'(t). 
We need to find a zero mean white noise process £( c ) (t) with variance erf (c) such that 

m+Pim'(t) = p 3 (B)e c) (t). 

Let 

P 3 {B) = l + a l B + a 2 B 2 . 

We have 

f(t) + (1 - (a + r)B + arB 2 )e'(t) = [1 + a x B + a 2 B 2 ]t {c) (t) . 
Taking variances on both sides we get 

erf + (1 + (a + rf + a 2 r 2 )a 2 ( , = [1 + a[ 2 + a' 2 2 ]a 2 (c) . 
Taking autocovariance at lag 1 on both sides we obtain 

- (a + r)erf, — ar(a + r)a 2 , = a-^a 2 ^ + o 1 a 2 o|(c) • 

This can be rewritten as 

— (a + r)(l + ar)a 2 , = a 1 (l + a 2 )a| (c) . 
Taking autocovariance at lag 2 on both sides 

amf, = a 2 cr| (c) , 



which gives 



2 ar 2 



Since erf (c) > erf, , we see that \a 2 \ < \ar\ and a 2 has the same sign as 
Substituting the last equation in Eqs. ([70)1 and (|6"5|) we obtain 



(63) 
(64) 

(65) 
(66) 
(67) 
(68) 

(69) 
(70) 
(71) 
(72) 



(a + r)(l + ar)oJ, = a 1 (l + a 2 )—a 2 ,, 

a 2 



(73) 
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and 

of + [1 + (a + rf + a 2 r>f, = [1 + of + ^af, . (74) 

a 2 

Thus we get 

a'li 1 + 4) = (a + r)(l + ar) 
a 2 ar 

and 

(1 + af + qg) [l + (a + r) 2 + a V] | 

a 2 a r ar of, ' 

We can solve these two equations for a 1 and a 2 . There will be multiple solutions. We choose that solution for which 
\a 2 \ < \ar\. Further the solution has to be such that the roots of 1 + a x B + a 2 B 2 = lie outside the unit circle. The 
last condition is required for the invertibility of the MA process (l + a^B + a^B 2 )^^). The expressions for a 1 and 
a 2 obtained by solving the above equations are very long and therefore we do not list them here. However, we can 
easily obtain the asymptotic behaviour of these solutions as follows. 

For our bivariate AR(l)process to be stable, we require that the roots of 

det[A7 - A(l)] = (77) 

lie within the unit circle i.e., the eigenvalues of A(l) should have absolute value less than 1. In our case 



A(l) 



a b 
d 



which is an upper triangular matrix. Hence eigenvalues are a and d. Therefore, for stability we require that \a\ < 1 
and \d\ < 1. 

As already derived, we have 

- V^'-^i } (78) 

Since \d\ < 1, the term within brackets is always positive and less than 1. It becomes zero only when 6 = 0. Hence 
\r\ < \d\ and r has same sign as d. As \d\ — > 1, \r\ — > 1. As \d\ — > or \b\ — > 0, we see that \r\ — > 0. 

We have already seen that \a 2 \ < \ar\. Since |r| < \d\, we obtain further results that \a 2 \ < \a\\d\ and a 2 has same 
sign as ad. Since \a\, \ d\ < 1, we get 

< \a 2 \ < \a\\d\ < 1. 

As \a\, \d\ — » 1, \a 2 \ also — > 1. As a — > 1, d — * 1 and the ratio o-|/<r|, — > 0, we have 



As the variance ratio — ► oo 



-2;a 1 -» 1. 



0;a 2 -» 0, 



as expected. The parameter a{ is hardly affected by the value of the parameter b. On the other hand, a 2 —> as 
& — > and saturates rapidly for b > 0.5. 
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FIG. 1: Granger causality spectra for a bivariate AR(1) process (a) Causality of X — » Y (b) Causality of Y — > X. The solid 
lines represent true causality spectra and the dashed lines represent spectra from noisy data. 
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FIG. 2: Granger causality spectra for a bivariate AR(2) process (a) Causality of X — > Y (b) Causality of Y — > X. The solid 
lines represent true causality spectra and the dashed lines represent spectra from noisy data. 
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FIG. 3: Granger causality spectra for the bivariate AR(1) process in Fig 1. (a) Causality of X — » Y (b) Causality of Y — > X. 
The solid lines represent true causality spectra and the dashed lines represent spectra obtained from the denoised data using 
the KEM algorithm. The dotted lines represent spectra obtained using HOY algorithm. 
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FIG. 4: Granger causality spectra for the bivariate AR(2) process in Fig 2. (a) Causality of X — > Y (b) Causality of Y — > X. 
The solid lines represent true causality spectra and the dashed lines represent spectra obtained from the denoised data using 
the KEM algorithm. 
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FIG. 5: Granger causality spectra for noisy data from a neural network model (a) Causality of x\ — > X2 (b) Causality of 
X2 — ► xi. The solid lines represent true causality spectra (noise-free data) and the dashed lines represent spectra from noisy 
data. 




re 

m 1.5 

3 

(0 

O 
i_ 
v 

? 0.5 
(0 

O 



(b) 



■ After denoising 

■ Exact 



20 40 60 

Frequency (Hz) 



80 



100 



FIG. 6: Granger causality spectra of the neural network model (a) Causality of x\ — t X2 (b) Causality of X2 ->ii. The solid 
lines represent true causality spectra (noise-free data) and the dashed lines represent spectra obtained from denoised data using 
the KEM algorithm. 



16 




FIG. 7: Granger causality spectra of the neural network model for the direction x\ — > X2- The solid line represents the Granger 
causality for denoised data, while the dashed line represents the Granger causality for noisy data. 95% confidence intervals are 
also given.. 



